Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data

نویسندگان

  • Christopher Olston
  • Jennifer Widom
چکیده

Strict consistency of replicated data is infeasible or not required by many distributed applications, so current systems often permit stale replication, in which cached copies of data values are allowed to become out of date. Queries over cached data return an answer quickly, but the stale answer may be unboundedly imprecise. Alternatively, queries over remote master data return a precise answer, but with potentially poor performance. To bridge the gap between these two extremes, we propose a new class of replication systems called TRAPP (Tradeoff in Replication Precision and Performance). TRAPP systems give each user fine-grained control over the tradeoff between precision and performance: Caches store ranges that are guaranteed to bound the current data values, instead of storing stale exact values. Users supply a quantitative precision constraint along with each query. To answer a query, TRAPP systems automatically select a combination of locally cached bounds and exact master data stored remotely to deliver a bounded answer consisting of a range that is no wider than the specified precision constraint, that is guaranteed to contain the precise answer, and that is computed as quickly as possible. This paper defines the architecture of TRAPP replication systems and covers some mechanics of caching data ranges. It then focuses on queries with aggregation, presenting optimization algorithms for answering queries with precision constraints, and reporting on performance experiments that demonstrate the fine-grained control of the precision-performance tradeoff offered by TRAPP systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compact Representations of Event Sequences

We introduce a new technique for the efficient management of large sequences of multidimensional data, which takes advantage of regularities that arise in real-world datasets and supports different types of aggregation queries. More importantly, our representation is flexible in the sense that the relevant dimensions and queries may be used to guide the construction process, easily providing a ...

متن کامل

Energy-Conscious Data Aggregation Over Large-Scale Sensor Networks

Recent advances in hardware technology facilitate applications requiring large numbers of sensor devices, where each sensor device has computational, storage, and communication capabilities. Since sensor devices are powered by ordinary batteries, power is a limiting resource in sensor networks. Power usage can be reduced by pushing part of the computation into the network to reduce communicatio...

متن کامل

Power-aware Query Processing over Sensor Networks

Recent advances in hardware technology make applications requiring large numbers of sensor devices possible, where each sensor device has computation, memory, and communication capabilities. Since sensor devices are powered by ordinary batteries, power is a limiting resource in sensor networks. Some work has been proposed to reduce the power usage by pushing part of the computation into the net...

متن کامل

External Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages

With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...

متن کامل

SMART: Adaptive Precision Setting for Aggregation Queries over Distributed Data Streams

We present SMART, a load-aware, self-tuning algorithm for processing continuous aggregate queries in distributed data stream systems. SMART maximizes query result accuracy while keeping monitoring bandwidth below a specified budget despite potentially bursty data streams whose workload characteristics change over time. To accomplish this goal, SMART’s hierarchical algorithm computes for each no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000